Problem Note 58720: Writing to the SAS® Scalable Performance Data Engine format fails on Hive data sets that are stored in HCatalog formats
If you use SAS high-performance procedures to read an HCatalog file from Hive and then write the file to the SAS SPD Engine format, the write process fails.
Example
In the following, SAS is used to create a file in Hive with an HCatalog storage format. In this example, the ORC format is used. Other HCatalog formats include Avro and Parquet.
/* Hive libname */
libname hivelib hadoop server='myhiveserver.example.com' user='testuser';
/* Write sashelp.class to Hive and store as an ORC file */
data hivelib.orcfile(DBCREATE_TABLE_OPTS='STORED AS ORC');
set sashelp.class;
run;
Next, the HPDS2 procedure is used to read and write this ORC file in parallel to the SPD Engine format.
/* SPDE libname */
libname spdelib spde "/user/testuser/spde" hdfshost=default;
/* Read the ORC file from Hive and write to SPDE */
proc hpds2 in=hivelib.orcfile out=spdelib.outfile;
data ds2gtf.out;
method run();
set ds2gtf.in;
end;
enddata;
run;
If option msglevel=i; is defined in the SAS code, INFO messages appear in the log showing the Hadoop configuration files read by the SPD Engine operation. As you can see, the main Hadoop service files in SAS_HADOOP_CONFIG_PATH are read, but the hive-site.xml file is not.
INFO: Read the content of utility file /opt/sas/hadoop/conf/core-site.xml.
INFO: Read the content of utility file /opt/sas/hadoop/conf/hdfs-site.xml.
INFO: Read the content of utility file /opt/sas/hadoop/conf/mapred-site.xml.
INFO: Read the content of utility file /opt/sas/hadoop/conf/yarn-site.xml.
Finally, error messages appear in the log indicating a java.lang.NoClassDefFoundError error.
NOTE: The HPDS2 procedure is executing in the distributed computing environment with 3 worker nodes.
ERROR: at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
ERROR: at com.dataflux.hadoop.DFHadoopMapReduce$1.run(DFHadoopMapReduce.java:424)
ERROR: at java.security.AccessController.doPrivileged(Native Method)
ERROR: at javax.security.auth.Subject.doAs(Subject.java:415)
ERROR: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
ERROR: at com.dataflux.hadoop.DFHadoopMapReduce.runMapReduce(DFHadoopMapReduce.java:310)
ERROR: Caused by: java.lang.reflect.InvocationTargetException
ERROR: at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
ERROR: at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
ERROR: at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
ERROR: at java.lang.reflect.Method.invoke(Method.java:606)
ERROR: at com.sas.access.hadoop.ep.utils.ClassCreationHelper.createUserInputFormat(ClassCreationHelper.java:215)
ERROR: ... 17 more
ERROR: Caused by: com.google.common.util.concurrent.ExecutionError:
java.lang.NoClassDefFoundError: javax/jdo/JDOException
etc.
Workaround
- In the directory pointed to by the SAS_HADOOP_JAR_PATH environment variable, create a directory and name it conf.
- Then, copy hive-site.xml to $SAS_HADOOP_JAR_PATH/conf/hive-site.xml.
Click the Hot Fix tab in this note to access the hot fix for this issue.
Operating System and Release Information
SAS System | Base SAS | Microsoft® Windows® for x64 | 9.4_M3 | 9.4_M4 | 9.4 TS1M3 | 9.4 TS1M4 |
64-bit Enabled AIX | 9.4_M3 | 9.4_M4 | 9.4 TS1M3 | 9.4 TS1M4 |
64-bit Enabled Solaris | 9.4_M3 | 9.4_M4 | 9.4 TS1M3 | 9.4 TS1M4 |
HP-UX IPF | 9.4_M3 | 9.4_M4 | 9.4 TS1M3 | 9.4 TS1M4 |
Linux for x64 | 9.4_M3 | 9.4_M4 | 9.4 TS1M3 | 9.4 TS1M4 |
Solaris for x64 | 9.4_M3 | 9.4_M4 | 9.4 TS1M3 | 9.4 TS1M4 |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
This SAS Note describes a failure that occurs when you write Hive HCatalog data sets to SPD Engine format.
Type: | Problem Note |
Priority: | high |
Date Modified: | 2016-08-15 13:29:55 |
Date Created: | 2016-08-02 12:00:04 |